We propose a domain adaptation method, MoDA, which adapts a pretrained embodied agent to a new, noisy environment without ground-truth supervision. Map-based memory provides important contextual information for visual navigation, and exhibits unique spatial structure mainly composed of flat walls and rectangular obstacles. Our adaptation approach encourages the inherent regularities on the estimated maps to guide the agent to overcome the prevalent domain discrepancy in a novel environment. Specifically, we propose an efficient learning curriculum to handle the visual and dynamics corruptions in an online manner, self-supervised with pseudo clean maps generated by style transfer networks. Because the map-based representation provides spatial knowledge for the agent's policy, our formulation can deploy the pretrained policy networks from simulators in a new setting. We evaluate MoDA in various practical scenarios and show that our proposed method quickly enhances the agent's performance in downstream tasks including localization, mapping, exploration, and point-goal navigation.
translated by 谷歌翻译
视觉预训练的最新进展表明,在不同的视觉任务中表现出惊人的表现,阐明了对人工智能研究中对视觉和文本概念的全面理解的长期问题。但是,在医学领域的视觉预训练的应用方面取得了有限数量和多样性阻碍了对联合视觉语言概念的成功学习。在这项研究中,我们介绍了Max-VL,这是一种针对医疗领域中有效视觉预训练的模型。我们在实验上证明,预先训练的MAX-VL模型在各种视觉任务中都优于当前最新视觉语言模型。我们还提出了用于诊断新出现疾病和人为错误检测的临床实用性,并显示了该模型在不同领域数据中的广泛适用性。
translated by 谷歌翻译
与单轴平面成像的2-D超声(US)相比,3-D US成像系统可以沿三个轴平面可视化容积。这允许完整的解剖学观察,这对于妇科(GYN)和产科(OB)应用是有用的。不幸的是,与2-D US相比,3-D US在分辨率中具有固有的限制。例如,在3-D US与3-D机械探针的情况下,例如,图像质量沿着光束方向可比较,但在其他两个轴向图像平面中通常观察到图像质量的显着劣化。为了解决这个问题,我们提出了一种新颖的无监督的深度学习方法来提高3-D US图像质量。特别是,使用{\ EM无与伦比的}高质量的2-D US图像作为参考,我们培训了最近提出的可切换Cyclean架构,以便在3-D中的每个映射平面都可以学习2-D US图像的图像质量。由于可切换架构,我们的网络还可以根据用户偏好提供对图像增强级别的实时控制,这是以用户为中心的扫描仪设置的理想选择。具有临床评估的广泛实验证实,我们的方法提供了显着提高的图像质量,也能成为用户友好的灵活性。
translated by 谷歌翻译
由于获取地面真理深度的难度厘定(360)图像,因此当今昼夜深度数据的质量和数量不足以代表世界各种场景。因此,360深度估算研究完全依赖于监督学习,注定要产生不令人满意的结果。虽然介绍了专注于昼夜平衡图像(EIS)的自我监督的学习方法,但它们通常具有不正确或非独特的解决方案,导致不稳定的性能。在本文中,我们提出了360个单眼深度估计方法,这些方法改善了预先研究的区域。首先,我们介绍了一种自我监督的360深度学习方法,只能利用重力排列的视频,这有可能在训练过程中消除深度数据的需求。其次,我们提出了一种通过组合监督和自我监督学习来实现的联合学习计划。补偿每个学习的弱点,从而导致更准确的深度估计。第三,我们提出了一个非本地融合块,当重建深度时,可以进一步保留由视觉变压器编码的全局信息。通过所提出的方法,我们成功将变压器应用于360深度估计,以至于我们的知识中的最佳,这尚未尝试过。在几个基准测试中,我们的方法实现了对先前作品的重大改进,并建立了最先进的技术。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
现有基于分数的生成模型(SGM)可以根据其参数化方法分类为约束的SGMS(CSGM)或无约束的SGMS(USGMS)。 CSGM模拟概率密度作为玻尔兹曼分布的作用,并将其预测作为某些标量值能量函数的负梯度。另一方面,USGM采用了能够直接估算得分的灵活体系结构,而无需明确模拟能量功能。在本文中,我们证明了CSGM的架构约束可能会限制其得分匹配能力。此外,我们表明USGMS无法保持保守性的财产可能导致严重的采样效率低下并在实践中降低采样性能。为了解决上述问题,我们提出了基于准保守分数的生成模型(QCSGM),以保持CSGM和USGM的优势。我们的理论推导表明,QCSGM的训练目标可以通过利用Hutchinson痕量估计器有效地整合到训练过程中。此外,我们对CIFAR-10,CIFAR-100,ImageNet和SVHN数据集的实验结果验证了QCSGM的有效性。最后,我们使用单层自动编码器的示例证明QCSGM的优势是合理的。
translated by 谷歌翻译
本文提出了一种实时模型预测控制(MPC)方案,以使用有限时间范围内的机器人执行多个任务。在工业机器人应用中,我们必须仔细考虑避免关节位置,速度和扭矩极限的多个限制。此外,无奇异性和平稳的动作需要连续,安全地执行任务。我们没有制定非线性MPC问题,而是使用沿层次控制器生成的名义轨迹线性线性的运动和动态模型来设计线性MPC问题。这些线性MPC问题可通过使用二次编程来解决;因此,我们大大减少了提出的MPC框架的计算时间,因此所得更新频率高于1 kHz。与基于操作空间控制(OSC)的基线相比,我们提出的MPC框架在减少任务跟踪错误方面更有效。我们在数值模拟和使用工业操纵器的实际实验中验证方法。更具体地说,我们将方法部署在两个实用方案中用于机器人物流:1)控制携带重载的机器人,同时考虑扭矩限制,以及2)控制最终效果,同时避免奇异性。
translated by 谷歌翻译